fix(pi07): gate optional prefix tokens in low- and high-level planners#229
Conversation
Backports the #226-equivalent prefix-gating fixes from pi07_paligemma (commit b7d3a82) to pi07's low- and high-level planners, plus updates the affected CPU regression test to match the new layout. Low-level (modeling_pi07_low_level.py::embed_prefix) had three remaining divergences in the all-optionals-dropped case: - state-end was unconditionally ", " (2 tokens) instead of pi05's ":\n". - ";\n " prefix-end was unconditionally appended even with no optional content to terminate, dangling 2 spurious tokens after the state-end. - "Action: " indicator att_masks were [1] + [0]*(N-1), making the indicator a single bidirectional block; pi05 uses [1]*N (one causal block per token), which shifts the cumsum at the indicator -> first- discrete boundary and changes the discrete-action CE loss. Now compute has_any_optional once at the top of embed_prefix and gate the state-end string + the prefix-end block on it. Indicator att_masks switched to [1]*N to match pi05. High-level (modeling_pi07_high_level.py::embed_prefix) had one of the three: ";\n " was emitted unconditionally as if a free-floating block, but it is the metadata -> "Updated Memory:" separator. With no metadata there is nothing to terminate. Wrapped under `if metadata_tokens is not None:`. The "Updated Memory: " AR anchor itself stays unconditional — inference relies on it (memory_tokens is None at inference by design). The existing CPU regression test `test_all_optional_blocks_absent_skips _emission` had the OLD (buggy) layout baked into its assertions; updated to match the new gated layout (11 tokens total, ":\n" state-end, no trailing prefix-end). A byte-identity test against pi06 (analogous to pi07_paligemma vs pi05 in #226) is deliberately deferred to a follow-up issue: pi07_low_level uses continuous state with separate State:/Action: indicators while pi06 inlines discretized state in the language prompt, and pi07_high_level uses a different prompt template, so byte-equivalence isn't feasible without aligning pi06's prompts (or building pi06_mem).
) (cherry picked from commit 35629df)
|
[claude-review] summary for commit bb94735 No blocking issues found. Re-reviewed after two test-only follow-up commits since the prior clean review at
All earlier findings remain resolved (per-token causal |
|
@claude fix per review, except for the |
- addresses @claude (LL embed_prefix discrete_actions test): added test_discrete_actions_indicator_uses_per_token_causal_blocks pinning the [1]*N att_masks pattern on the "Action: " indicator + discrete action span, so a regression to [1]+[0]*(N-1) fails immediately. - addresses @claude (has_subgoal style nit): wrapped m.any() in bool(...) inside the any(...) generator so has_subgoal materializes Python bools matching has_response/has_metadata. tests: passed - pytest -m "not gpu" -n auto tests/policies/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 16c46e8)
`test_complete_pi06_pipeline_integration_smoke` was added in #178 with chunk_size=10 but the shared `lerobot_dataset_metadata` fixture provides actions stats shaped (50, 32) — matching the default PI06Config. The Normalize buffer was therefore (50, 32) while the test batch's actions were (B, 10, 32), and MIN_MAX normalization in normalize.py:232 raised ``RuntimeError: The size of tensor a (10) must match the size of tensor b (50) at non-singleton dimension 1``. Pre-existing bug — never caught in CI because the test is gated by @pytest.mark.gpu and skipped in CPU runs. Surfaced now while validating this PR's SDPA + grad-ckpt port on a real GPU. Fix by deep-copying the fixture stats and reshaping the actions max/mean/min/std arrays from (50, 32) to (chunk_size, 32) before constructing the policy. Same numeric values, just the right shape. (cherry picked from commit 6425cb4)
`_verify_position_ids` and `_verify_action_expert_attention_mask` in test_pi07_low_level_planner.py only excluded DISCRETE_ACTION_MAX_LENGTH from the cross-attention prefix slice. The model's forward path (modeling_pi07_low_level.py:1575-1620) excludes both the "Action: " indicator len (3 tokens via Gemma 3 tokenizer) AND discrete_action_max_length, matching pi05's discrete_action_indicator_max_length logic — so the action expert sees the same prefix length at train and inference. The test was a latent break since 2949e73 ("Applying pi07_paligemma fixes to pi07") added the indicator subtraction to the model but never updated the test. Surfaces as soon as the GPU test test_complete_pi07_low_level_pipeline runs and asserts suffix_position_ids — actual was prefix_offset+(token), expected was prefix_offset+indicator_len+(token), 3-token gap on Gemma 3. Pull the indicator length from the existing _indicator_lens helper so the fix tracks future tokenizer changes automatically. Plumb the tokenizer through the two _verify_action_expert_attention_mask call sites.
Brings 16 main commits onto feat/pi07 in preparation for opening the feat/pi07 → main PR. One conflict (tests/policies/test_pi06.py: np.tile vs np.full for chunk-size action stats) — kept HEAD's np.tile fix from claude review feedback on #197, which preserves per-feature variance instead of collapsing every step to stats.flatten()[0]. Also dropped stale --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py from .github/workflows/cpu_test.yml and gpu_test.yml: main deleted the file in #234, but the --ignore line on feat/pi07 (added in #229) was on a different part of the file so git auto-merged without flagging it. Removing both now keeps the workflows clean.
What this does
Backports the #226-equivalent prefix-gating fixes from
pi07_paligemma(commitb7d3a82onclaude/fix-issue-226-gt8z3, merging intofix/pi07_paligemma_fixesthen intomain) topi07's low- and high-level planners.Low-level planner (
pi07/low_level_planner/modeling_pi07_low_level.py::embed_prefix) had three remaining divergences when all optional middle blocks (response / subgoal / metadata) are dropped:", "(2 tokens) instead of":\n".";\n "prefix-end was unconditionally appended even with no optional content to terminate, dangling 2 spurious tokens after the state-end."Action: "indicator att_masks were[1] + [0]*(N-1)(single bidirectional block);pi05uses[1]*N(one causal block per token), which shifts the cumsum at the indicator → first-discrete boundary and changes the discrete-action CE loss.The fix computes
has_any_optionalonce at the top ofembed_prefixand gates the state-end string + the prefix-end block on it. Indicator att_masks switched to[1]*Nto matchpi05. (The earlier2949e73commit only ported the response/subgoal/metadata content gating, not the surrounding separators or indicator att_masks.)High-level planner (
pi07/high_level_planner/modeling_pi07_high_level.py::embed_prefix) had one of the three issues —";\n "was emitted unconditionally, but it is themetadata → "Updated Memory:"separator. With no metadata there's nothing to terminate. Wrapped underif metadata_tokens is not None:. The"Updated Memory: "AR anchor itself stays unconditional because inference relies on it (memory_tokens is None at inference by design).A byte-identity test against
pi06(analogous to thepi07_paligemmavspi05test in #226) is deliberately deferred to a follow-up issue (#230):pi07_low_leveluses continuous state with separateState:/Action:indicators whilepi06inlines discretized state in the language prompt, andpi07_high_leveluses a different prompt template ("Task: {task}, Past Memory: {past_mem}, State: {state}, "vspi06's"Task: {task}<eos>State: {state}<eos>Actions:"), so byte-equivalence isn't feasible without aligningpi06's prompts (or buildingpi06_mem).Drive-by fixes pulled from
mainto unblock CIfeat(ci): pin Claude workflows to Opus 4.7 1M, allow test-skip on doc fixes (#208)— workflow file must matchmainto satisfy GitHub workflow validation; was failing theClaude PR Reviewcheck.fix(ci): gate extract-claude-lessons on head branch, not PR author (#212)— same reason.ci: skip pi07_paligemma low-level test (broken at import) to unblock CPU CI (#209)—feat/pi07was missing the--ignore=tests/policies/test_pi07_paligemma_low_level_planner.pyflag incpu_test.yml; the test file fails at import (VJEPA2VideoEncoderremoved inpi05_mem) and that crashed CPU CI.ci: ignore broken pi07 test in GPU CI; dedupe TODO in cpu_test (#217)— same flag missing ingpu_test.yml+ dedup.fix(test): override actions stats shape in pi06 GPU smoke (PR #214)—test_complete_pi06_pipeline_integration_smokewas failing on real GPU because the fixture stats shape(50, 32)didn't matchchunk_size=10. Pre-existing latent bug; surfaced when running the GPU suite end-to-end onmlbox.Test fix surfaced by the GPU run
_verify_position_idsand_verify_action_expert_attention_maskintest_pi07_low_level_planner.pyonly excludedDISCRETE_ACTION_MAX_LENGTHfrom the cross-attention prefix slice. The model (post-2949e73) already excludes the"Action: "indicator length (3 tokens via Gemma 3 tokenizer) too, matchingpi05'sdiscrete_action_indicator_max_lengthlogic — so the action expert sees the same prefix length at train and inference. The 3-token gap fails thesuffix_position_idsassertion as soon as the GPU integration test runs. Latent break since2949e73; never caught because the test is gated by@pytest.mark.gpu.How it was tested
pre-commit run --all-files— all hooks pass.pytest -m "not gpu" -n auto tests/policies/): 125 passed, 2 skipped, 1 pre-existing collection error intest_pi07_paligemma_low_level_planner.py(now excluded from CI by the--ignoreflag pulled from ci: skip pi07_paligemma low-level test (broken at import) to unblock CPU CI #209/ci: ignore broken pi07 test in GPU CI; dedupe TODO in cpu_test #217).mlbox(RTX 5090, 32 GB):pytest -m gpu -n 0 -v --ignore=tests/policies/test_pi07_paligemma_low_level_planner.py tests/→ 20 passed, 2 skipped, 786 deselected, 0 failures, no OOM, 6m10s. Includestests/policies/test_pi07_low_level_planner.py::test_complete_pi07_low_level_pipeline,test_pi07_high_level_planner.py::test_complete_pi07_high_level_planner_pipeline, the regression class, and pi05 / pi05_mem / pi06 / pi0 / value smoke tests.tests/policies/test_pi07_cpu.py::TestEmbedPrefixConditionalGuardsexercises the gating:test_all_optional_blocks_absent_skips_emissionwas updated to match the new layout (11 tokens,":\n"state-end, no trailing prefix-end);test_mixed_subgoal_availability_zeros_pad_only_samplesandtest_response_mask_any_true_emits_blockstill pass because they populate at least one optional, sohas_any_optional == Trueand the layout matches the old expectations.regression_test.yml) are deferred to the existing schedule.How to checkout & try? (for the reviewer)
git fetch origin git checkout claude/fix-pi07-prefix-gating uv sync --extra dev pytest -m "not gpu" -n auto tests/policies/test_pi07_cpu.py tests/policies/test_pi07_low_level_planner.py tests/policies/test_pi07_high_level_planner.pyGPU subset:
To re-verify the diff against the canonical
pi07_paligemmafix:Checklist
GPU pytests run on
mlbox(RTX 5090) — see "How it was tested" above. Nightly regression tests are deferred to the existing CI schedule. A follow-up issue (#230) is filed forpi06prompt alignment + future byte-identity tests againstpi07planners.Note: Before submitting this PR, please read the contributor guideline.